title: “BrightonAC Performance Analysis” author: “Daniel Kolosov” date: “2024-04-29” output: html_document theme: flatly runtime: shiny —

Trends in sprint performance across the outdoor season: When should Brighton attempt the 4 X 100m

Research question:When should Brighton run the 4 X 100? Sprinters peak in performance at varying times in the indoor/outdoor season. This has implications to the 4 X 100 meter relay. Athletes are steadfast on breaking their clubs records, so the athlete’s peaks should be accounted for when attempting to do this.

BrightonAC website
BrightonAC website

The data: The data is acquired via the ’power of 10 website. Official competitions record several facets of these competitions once they have commenced. These facets are: Time, event, date, venue. These are uploaded to the profiles of athletes. Once library are installed and loaded, the data can be scraped.

#These are the relevant packages to add.  
# If packages are not yet installed, this code installs them. 
if (!requireNamespace("tidyverse", quietly = TRUE)) install.packages("tidyverse")
if (!requireNamespace("shiny", quietly = TRUE)) install.packages("shiny")
if (!requireNamespace("rvest", quietly = TRUE)) install.packages("rvest")
if (!requireNamespace("lubridate", quietly = TRUE)) install.packages("lubridate")
if (!requireNamespace("plotly", quietly = TRUE)) install.packages("plotly")
if (!requireNamespace("dplyr", quietly = TRUE)) install.packages("dplyr")
if (!requireNamespace("crosstalk", quietly = TRUE)) install.packages("crosstalk")

# Load libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(shiny)
library(rvest)
## 
## Attaching package: 'rvest'
## 
## The following object is masked from 'package:readr':
## 
##     guess_encoding
library(lubridate)
library(plotly)
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
library(dplyr)
library(crosstalk)
## 
## Attaching package: 'crosstalk'
## 
## The following object is masked from 'package:shiny':
## 
##     getDefaultReactiveDomain
library(knitr)
library(ggplot2)
library(plotly)

Next, the power of 10 data is scraped. There is an advantage to this. When the code runs, it takes the most up-to-date data. This means the data maintains its self. Furthermore, this is made more efficient by turning the athlete scraping code into a function. This is because it avoids duplicating the same scraping commands for each athlete. The variables scraped for are named as they are on power of 10. This enhanses clarity for those familiar with the website.

#html nodes and html text read data from a website. 
#data.frame puts the data together from the website into a dataframe 
#This is turned to a function for efficiency in adding athletes. 
scrape_athlete <- function(athlete_id) {
  link <- paste0("https://www.thepowerof10.info/athletes/profile.aspx?athleteid=", athlete_id, "&viewby=event")
  page <- read_html(link)
  data <- page %>% html_nodes("#cphBody_pnlPerformances")
  time <- data %>% html_nodes("td:nth-child(3)") %>% html_text()
  event <- data %>% html_nodes(".alternatingrowspanel td:nth-child(2)") %>% html_text()
  date <- data %>% html_nodes("td:nth-child(13)") %>% html_text()
  venue <- data %>% html_nodes("td:nth-child(11)") %>% html_text()
  name <- page %>% html_nodes("h2") %>% html_text() %>% head(1) 
  athleteID <- rep(athlete_id, length(date)) 
  data.frame(athleteID, name, date, time, venue, event, stringsAsFactors = FALSE)
}

# Use scrape athlete function to put some power of 10 profiles together. These are all BrightonAC sprinters. 
athletes <- c("1093555", "468971", "468972", "601046")
athlete_data <- map_df(athletes, scrape_athlete)

# Extract athlete names
athlete_names <- unique(athlete_data$name)

# Display a subset of rows with fewer values. This is what the data looks like now. 
kable(athlete_data[1:5, c("name", "date", "time", "event", "venue")])
name date time event venue
Daniel Kolosov Date Perf Event Venue
Daniel Kolosov 24 Feb 23 7.28 60 Sheffield
Daniel Kolosov 5 Mar 23 7.31 60 Carshalton
Daniel Kolosov 18 Jan 23 7.36 60 Newham
Daniel Kolosov 18 Jan 23 7.36 60 Newham

I am interested in the 100m. Thus, it is prudent to filter out other events. Venues are also of minor importance, and will thus be ignored. This will enhance clarity of the subsequent visualization and will decrease computational load to run this code.

# Filter for '100' events
athlete_data <- athlete_data %>%
  filter(event == "100")

# Filter dates between February and August for the years 2022 to 2023
athlete_data <- athlete_data %>%
  mutate(date = dmy(date),
         month_year = format(date, "%Y-%m")) %>%
  filter(year(date) %in% c(2022, 2023),
         month(date) %in% 2:8)

# Make time numerical 
athlete_data$time <- as.numeric(athlete_data$time)

# Calculate mean time for each athlete for each month
performances <- athlete_data %>%
  group_by(athleteID, month_year) %>%
  summarise(avg_time = mean(time))
## `summarise()` has grouped output by 'athleteID'. You can override using the
## `.groups` argument.
# Convert month_year to Date type
performances$month_year <- as.Date(paste0(performances$month_year, "-01"))

kable(athlete_data[1:5, c("name", "date", "time", "event")])
name date time event
Daniel Kolosov 2023-06-30 11.18 100
Daniel Kolosov 2023-05-29 11.19 100
Daniel Kolosov 2023-05-29 11.21 100
Daniel Kolosov 2023-06-24 11.21 100
Daniel Kolosov 2023-06-13 11.22 100

This is a plotly plot. it is suitable because it features a comparison tool which allows users to manually compare the athletes times. They can the calculate the lowest of these times for any given month. The downside of this plot is that the user must make their own calculation.

# Plot average time for each athlete over time, faceted by year
Linegraph <- performances %>%
  plot_ly(x = ~month_year, y = ~avg_time,
          color = ~athleteID, 
          type = 'scatter', mode = 'markers+lines',
          line = list(width = 1)) %>%
  layout(title = "Athletes' Average Time Over Time",
         xaxis = list(title = "Month-Year", tickformat = "%b"),
         yaxis = list(title = "Average Time"),
         facet_row = ~year(month_year))

# View the plot
Linegraph
## Warning: 'layout' objects don't have these attributes: 'facet_row'
## Valid attributes include:
## '_deprecated', 'activeshape', 'annotations', 'autosize', 'autotypenumbers', 'calendar', 'clickmode', 'coloraxis', 'colorscale', 'colorway', 'computed', 'datarevision', 'dragmode', 'editrevision', 'editType', 'font', 'geo', 'grid', 'height', 'hidesources', 'hoverdistance', 'hoverlabel', 'hovermode', 'images', 'legend', 'mapbox', 'margin', 'meta', 'metasrc', 'modebar', 'newshape', 'paper_bgcolor', 'plot_bgcolor', 'polar', 'scene', 'selectdirection', 'selectionrevision', 'separators', 'shapes', 'showlegend', 'sliders', 'smith', 'spikedistance', 'template', 'ternary', 'title', 'transition', 'uirevision', 'uniformtext', 'updatemenus', 'width', 'xaxis', 'yaxis', 'barmode', 'bargap', 'mapType'

To avoid the downside of the user making their own calculations, an R shiny can be made to dynamically apply calculations based on the athlete the user selects. The user interface will be made first.

# UI
#Use fluid page to adjust UI to users browser size. 
#Adding actions to user interface: choose athletes, calculate their fastest month and revert to original. 
#Using plotly line graph for the plot. 
ui <- fluidPage(
  titlePanel("BrightonAC 4 x 100m tool"),
  sidebarLayout(
    sidebarPanel(
      selectInput("athlete", "Select Athlete:",
                  choices = c("All", athlete_names),
                  selected = "All"),
      actionButton("calculate_button", "Identify best month to run"),
      actionButton("revert_button", "Revert to Original Plot")
    ),
    mainPanel(
      plotlyOutput("linegraph")
    )
  )
)
ui #Display user interface.

BrightonAC 4 x 100m tool

Then, it is important to make these calculations, and ensure they are visible on the plot for the user. Code is made for calculating the lowest average time across the months, as well as for reverting the plot pre-calculation button. When the calculation is present, the plot made visible to the user has an arrow on the bottom of the y axis. The arrow points to the fastest combined time, which is the lowest time. The dates are filtered to 2022-ongoing, meaning the plot does not require maintenance. This is an advantage of the plot.

# Server logic
#'input' from user, 'output' from code. 
server <- function(input, output, session) {
  
  # Make a placeholder for the original plot which is rendered later 
  original_plot <- reactiveVal(NULL)
  
  # Reactive function for scraping data based on the selected athlete
  athlete_data <- reactive({
    if (input$athlete == "All") {
      # If "All" athletes selected, scrape data for all athletes
      data <- map_df(athletes, scrape_athlete) %>%
        filter(event == "100") %>%
        mutate(date = dmy(date),
               year = year(date),
               month = month(date),
               month_year = format(date, "%Y-%m")) %>%
        filter(date >= ymd("2022-02-01")) %>%
        mutate(time = as.numeric(time)) %>%
        group_by(athleteID, name, year, month_year) %>%
        summarise(avg_time = mean(time))
      original_plot(data)  # Store original plot data
      return(data)
    } else {
      # If specific athlete selected, scrape data for that athlete
      athlete_id <- athletes[which(athlete_names == input$athlete)]
      data <- scrape_athlete(athlete_id) %>%
        filter(event == "100") %>%
        mutate(date = dmy(date),
               year = year(date),
               month = month(date),
               month_year = format(date, "%Y-%m")) %>%
        filter(date >= ymd("2022-02-01")) %>%
        mutate(time = as.numeric(time)) %>%
        group_by(athleteID, name, year, month_year) %>%
        summarise(avg_time = mean(time))
      original_plot(data)  # Store original plot data
      return(data)
    }
  })
  
  # Function to calculate combined performance
  calculate_combined_performance <- function(performances) {
    num_athletes <- length(unique(performances$name))
    month_counts <- performances %>%
      group_by(month_year) %>%
      summarise(num_data_points = n_distinct(name))
    valid_months <- month_counts %>%
      filter(num_data_points == num_athletes) %>%
      pull(month_year)
    valid_performances <- performances %>%
      filter(month_year %in% valid_months)
    combined_performance <- valid_performances %>%
      group_by(month_year) %>%
      summarise(combined_time = sum(avg_time))
    fastest_month <- combined_performance$month_year[which.min(combined_performance$combined_time)]
    return(fastest_month)
  }
  
  # Custom tick text for x-axis (setting the date parameters for the plot)
  custom_ticktext <- format(seq(ymd("2022-02-01"), Sys.Date(), by = "month"), "%b %Y")
  custom_ticktext[month(seq(ymd("2022-02-01"), Sys.Date(), by = "month")) %in% c(10:12, 1:2)] <- "Indoor szn"
  custom_ticktext[month(seq(ymd("2022-02-01"), Sys.Date(), by = "month")) %in% 3:9] <- format(seq(ymd("2022-02-01"), Sys.Date(), by = "month")[month(seq(ymd("2022-02-01"), Sys.Date(), by = "month")) %in% 3:9], "%b %Y")
  
  # Reactive function for the plot
  output$linegraph <- renderPlotly({
    performances <- athlete_data()
    if (input$calculate_button > 0) {
      # If calculate button clicked, calculate combined performance
      fastest_month <- calculate_combined_performance(performances)
      # Create plotly plot with annotations for fastest month
      plot <- plotly::plot_ly(data = performances,
                              x = ~month_year, y = ~avg_time,
                              color = ~name,
                              text = ~name,
                              type = 'scatter', mode = 'markers+lines',
                              line = list(width = 1),
                              showlegend = TRUE,
                              legendgroup = ~name) %>%
        layout(
          title = "Performance Across Time",
          xaxis = list(title = list(text = "Month-Year", font = list(family = "Helvetica", size = 20, color = "black")),
                       tickmode = "array", tickvals = seq(ymd("2022-02-01"), Sys.Date(), by = "month"), 
                       ticktext = custom_ticktext, font = list(family = "Helvetica", size = 14, color = "black"),
                       linecolor = "black", gridcolor = "white"),
          yaxis = list(title = list(text = "Average 100m time (sec)", font = list(family = "Helvetica", size = 20, color = "black")),
                       font = list(family = "Helvetica", size = 14, color = "black"),
                       linecolor = "black", gridcolor = "white"),
          plot_bgcolor = "#F5F5F5",  # very light grey
          paper_bgcolor = "#F5F5F5",  # very light grey
          legend = list(
            x = 1.05, y = 1,
            title = list(text = "Toggle athletes to compare between them", font = list(family = "Helvetica", size = 14, color = "black")),
            traceorder = "normal",
            font = list(family = "Helvetica", size = 12, color = "black"),
            bgcolor = "white",
            bordercolor = "black",
            borderwidth = 1
          )
        ) %>%
        add_annotations(
          x = fastest_month, y = -Inf, 
          text = paste("Fastest Month:", fastest_month),
          showarrow = TRUE,
          arrowhead = 2,
          arrowsize = 1,
          arrowwidth = 2,
          arrowcolor = "red",
          ax = 0,
          ay = -30,
          font = list(family = "Helvetica", size = 12, color = "black")
        )
    } else {
      # If calculate button not clicked, create regular plotly plot
      plot <- plotly::plot_ly(data = performances,
                              x = ~month_year, y = ~avg_time,
                              color = ~name,
                              text = ~name,
                              type = 'scatter', mode = 'markers+lines',
                              line = list(width = 1),
                              showlegend = TRUE,
                              legendgroup = ~name) %>%
        layout(
          title = "Performance Across Time",
          xaxis = list(title = list(text = "Month-Year", font = list(family = "Helvetica", size = 20, color = "black")),
                       tickmode = "array", tickvals = seq(ymd("2022-02-01"), Sys.Date(), by = "month"), 
                       ticktext = custom_ticktext, font = list(family = "Helvetica", size = 14, color = "black"),
                       linecolor = "black", gridcolor = "white"),
          yaxis = list(title = list(text = "Average 100m time (sec)", font = list(family = "Helvetica", size = 20, color = "black")),
                       font = list(family = "Helvetica", size = 14, color = "black"),
                       linecolor = "black", gridcolor = "white"),
          plot_bgcolor = "#F5F5F5",  # very light grey
          paper_bgcolor = "#F5F5F5",  # very light grey
          legend = list(
            x = 1.05, y = 1,
            title = list(text = "Toggle athletes to compare between them", font = list(family = "Helvetica", size = 14, color = "black")),
            traceorder = "normal",
            font = list(family = "Helvetica", size = 12, color = "black"),
            bgcolor = "white",
            bordercolor = "black",
            borderwidth = 1
          )
        )
    }
    
    plot %>% layout(margin = list(t = 100))  # Add margin to the top
  })
  
  # If revert button clicked, reset plot to original
  observeEvent(input$revert_button, {
    original_data <- original_plot()
    if (!is.null(original_data)) {
      output$linegraph <- renderPlotly({
        plot <- plotly::plot_ly(data = original_data,
                                x = ~month_year, y = ~avg_time,
                                color = ~name,
                                text = ~name,
                                type = 'scatter', mode = 'markers+lines',
                                line = list(width = 1),
                                showlegend = TRUE,
                                legendgroup = ~name) %>%
          layout(
            title = "Performance Across Time",
            xaxis = list(title = list(text = "Month-Year", font = list(family = "Helvetica", size = 20, color = "black")),
                         tickmode = "array", tickvals = seq(ymd("2022-02-01"), Sys.Date(), by = "month"), 
                         ticktext = custom_ticktext, font = list(family = "Helvetica", size = 14, color = "black"),
                         linecolor = "black", gridcolor = "white"),
            yaxis = list(title = list(text = "Average 100m time (sec)", font = list(family = "Helvetica", size = 20, color = "black")),
                         font = list(family = "Helvetica", size = 14, color = "black"),
                         linecolor = "black", gridcolor = "white"),
            plot_bgcolor = "#F5F5F5",  # very light grey
            paper_bgcolor = "#F5F5F5",  # very light grey
            legend = list(
              x = 1.05, y = 1,
              title = list(text = "Toggle athletes to compare between them", font = list(family = "Helvetica", size = 14, color = "black")),
              traceorder = "normal",
              font = list(family = "Helvetica", size = 12, color = "black"),
              bgcolor = "white",
              bordercolor = "black",
              borderwidth = 1
            )
          )
        
        plot %>% layout(margin = list(t = 100))  # Add margin to the top
      })
    }
  })
}

# Run the application 
shinyApp(ui = ui, server = server)
Shiny applications not supported in static R Markdown documents

Conclusion

The teams fastest combined time together was in May 2023. When looking manually at 2022, the team is also generally peaking in may also. Thus, it can be concluded that the team should attend a competition in May 2024 and attempt to break their record while at combined peak condition.

By plotting performances according to date, this is an insight which can be made with little effort. Such plots have implications for coaches and athletes.